We consider penalized estimation in hidden Markov models (HMMs) withmultivariate Normal observations. In the moderate-to-large dimensional setting,estimation for HMMs remains challenging in practice, due to several concernsarising from the hidden nature of the states. We address these concerns by$\ell_1$-penalization of state-specific inverse covariance matrices. Penalizedestimation leads to sparse inverse covariance matrices which can be interpretedas state-specific conditional independence graphs. Penalization is nontrivialin this latent variable setting; we propose a penalty that automatically adaptsto the number of states $K$ and the state-specific sample sizes and can copewith scaling issues arising from the unknown states. The methodology isadaptive and very general, applying in particular to both low- andhigh-dimensional settings without requiring hand tuning. Furthermore, ourapproach facilitates exploration of the number of states $K$ by couplingestimation for successive candidate values $K$. Empirical results on simulatedexamples demonstrate the effectiveness of the proposed approach. In achallenging real data example from genome biology, we demonstrate the abilityof our approach to yield gains in predictive power and to deliver richerestimates than existing methods.
展开▼
机译:我们考虑带有多元正态观测值的隐马尔可夫模型(HMM)中的惩罚估计。在中等规模到较大维度的情况下,由于国家的隐藏性质引起了一些担忧,对HMM的估计在实践中仍然具有挑战性。我们通过对特定于状态的逆协方差矩阵进行$ \ ell_1 $惩罚来解决这些问题。惩罚估计导致稀疏的逆协方差矩阵,该矩阵可以解释为状态特定的条件独立图。惩罚是非trivialin这种潜在变量的设置;我们提出了一种惩罚措施,该惩罚措施可以自动适应州数量$ K $和州特定样本数量,并且可以应对由未知州引起的缩放问题。该方法是自适应的并且非常通用,特别适用于低维和高维设置,而无需手动调整。此外,我们的方法通过耦合对连续候选值$ K $的估计,促进了状态数$ K $的探索。仿真示例的经验结果证明了该方法的有效性。在从基因组生物学获得具有挑战性的真实数据示例中,我们证明了我们的方法具有比现有方法更大的预测能力并提供更丰富的估计的能力。
展开▼